Rethinking FPGA Computing with a Many-Core Approach

نویسندگان

  • John Wawrzynek
  • Mingjie Lin
  • Ilia Lebedev
  • Shaoyi Cheng
  • Daniel Burke
چکیده

While ASIC design and manufacturing costs are soaring with each new technology node, the computing power and logic capacity of modern FPGAs steadily advances. Therefore, high-performance computing with FPGA-based system becomes increasingly attractive and viable. Unfortunately, truly unleashing the computing potential of FPGAs often stipulates cumbersome HDL programming and laborious manual optimization. To circumvent such challenges, we propose a Many-core Approach to Reconfigurable Computing (MARC) that (i) allows programmers to easily express parallelism through a high-level programming language, (ii) supports coarse-grain multithreading and dataflowstyle fine-grain threading while permitting bit-level resource control, and (iii) greatly reduces the effort required to repurpose the hardware system for different algorithms or different applications. Leveraging a many-core architectural template, sophisticated logic synthesizing techniques, and state-of-art compiler optimization technology, a MARC system enables efficient highperformance computing for applications expressed with imperative programming languages such as C/C++ by exploiting abundant special FPGA resources such as distributed block memories and DSP blocks to implement complete single-chip high efficiency many-core microarchitectures. To quantitatively validate the proposed MARC system, we implemented a MARC prototype machine consisting of one control processing core and 32 arithmetic processing cores using a Virtex-5 (XCV5LX155T-2) FPGA. For a well-known general-purpose Bayesian computing problem, we compare the throughput and runtime of this MARC machine, with fully synthesized application-specific processing cores, against a manually optimized FPGA implementation—BCM (Bayesian Computing Machine) [1]. As the problem sizes range from 10 to 10, this MARC machine achieve 8.13 GFLOPS in throughput on average, which is 43% of that of BCM but with much less design/implementation effort and much greater portability and retargetability. More importantly, we developed a simple analytical performance model to explain the performance discrepancy between the MARC machine and the hand-optimized BCM FPGA implementation [1].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Literature Review on Cloud Computing Security Issues

The use of Cloud Computing has increasedrapidly in many organization .Cloud Computing provides many benefits in terms of low cost and accessibility of data. In addition Cloud Computing was predicted to transform the computing world from using local applications and storage into centralized services provided by organization.[10] Ensuring the security of Cloud Computing is major factor in the Clo...

متن کامل

A Literature Review on Cloud Computing Security Issues

The use of Cloud Computing has increasedrapidly in many organization .Cloud Computing provides many benefits in terms of low cost and accessibility of data. In addition Cloud Computing was predicted to transform the computing world from using local applications and storage into centralized services provided by organization.[10] Ensuring the security of Cloud Computing is major factor in the Clo...

متن کامل

Neuro-fuzzy control of bilateral teleoperation system using FPGA

This paper presents an adaptive neuro-fuzzy controller ANFIS (Adaptive Neuro-Fuzzy Inference System) for a bilateral teleoperation system based on FPGA (Field Programmable Gate Array). The proposed controller combines the learning capabilities of neural networks with the inference capabilities of fuzzy logic, to adapt with dynamic variations in master and slave robots and to guarantee good prac...

متن کامل

FPGA Implementation of a Hammerstein Based Digital Predistorter for Linearizing RF Power Amplifiers with Memory Effects

Power amplifiers (PAs) are inherently nonlinear elements and digital predistortion is a highly cost-effective approach to linearize them. Although most existing architectures assume that the PA has a memoryless nonlinearity, memory effects of the PAs in many applications ,such as wideband code-division multiple access (WCDMA) or orthogonal frequency-division multiplexing (OFDM), can no longer b...

متن کامل

Implementation of VlSI Based Image Compression Approach on Reconfigurable Computing System - A Survey

Image data require huge amounts of disk space and large bandwidths for transmission. Hence, imagecompression is necessary to reduce the amount of data required to represent a digital image. Thereforean efficient technique for image compression is highly pushed to demand. Although, lots of compressiontechniques are available, but the technique which is faster, memory efficient and simple, surely...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010